OPTIMAL SUBSAMPLING ALGORITHMS FOR BIG DATA REGRESSIONS
نویسندگان
چکیده
To fast approximate maximum likelihood estimators with massive data, this paper studies the Optimal Subsampling Method under A-optimality Criterion (OSMAC) for generalized linear models. The consistency and asymptotic normality of estimator from a general subsampling algorithm are established, optimal probabilities A- L-optimality criteria derived. Furthermore, using Frobenius norm matrix concentration inequalities, finite sample properties subsample based on also Since depend full data estimate, an adaptive two-step is developed. Asymptotic optimality established. proposed methods illustrated evaluated through numerical experiments simulated real datasets.
منابع مشابه
Design of Algorithms for Big Data Analytics
Very large volumes of data are being collected in every domain of human endeavour. The term Big Data has become pervasive recently and denotes the problems related to the storage, management, and analysis of these large amounts of data. Business, sciences, and engineering are all becoming dependent on their ability to harness useful knowledge from very large datasets. For example, the research ...
متن کاملAdaptive Caching Algorithms for Big Data Systems
Today’s Big Data platforms have enabled the democratization of data by allowing data sharing among various data processing frameworks and applications that run in the same platform. This data and resource sharing, combined with the fact that most applications tend to access a hot set of the data has led to the development of external, in-memory, distributed caching frameworks. In this paper, we...
متن کاملMapReduce Algorithms for Big Data Analysis
There is a growing trend of applications that should handle big data. However, analyzing big data is a very challenging problem today. For such applications, the MapReduce framework has recently attracted a lot of attention. Google’s MapReduce or its open-source equivalent Hadoop is a powerful tool for building such applications. In this tutorial, we will introduce the MapReduce framework based...
متن کاملParallel Algorithms for Big Data Optimization
We propose a decomposition framework for the parallel optimization of the sum of a differentiable function and a (block) separable nonsmooth, convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. Our framework is very flexible and includes both fully parallel Jacobi schemes and Gauss-Seidel (i.e., sequential) ones, as well as virtually all pos...
متن کاملCs 598csc: Algorithms for Big Data
Suppose we have a stream a1, a2, . . . , an of objects from an ordered universe. For simplicity we will assume that they are real numbers and more over that they are distinct (for simplicity). We would like to find the k’th ranked element for some 1 ≤ k ≤ n. In particular we may be interested in the median element. We will discuss exact and approximate versions of these problems. Another termin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Statistica Sinica
سال: 2021
ISSN: ['1017-0405', '1996-8507']
DOI: https://doi.org/10.5705/ss.202018.0439